智能论文笔记

Pixel2ISDF: Implicit Signed Distance Fields based Human Body Model from Multi-view and Multi-pose Images

Jianchuan Chen , Wentao Yi , Tiantian Wang , Xing Li , Liqian Ma , Yangyu Fan , Huchuan Lu

分类：计算机视觉

2022-12-06

In this report, we focus on reconstructing clothed humans in the canonical space given multiple views and poses of a human as the input. To achieve this, we utilize the geometric prior of the SMPLX model in the canonical space to learn the implicit representation for geometry reconstruction. Based on the observation that the topology between the posed mesh and the mesh in the canonical space are consistent, we propose to learn latent codes on the posed mesh by leveraging multiple input images and then assign the latent codes to the mesh in the canonical space. Specifically, we first leverage normal and geometry networks to extract the feature vector for each vertex on the SMPLX mesh. Normal maps are adopted for better generalization to unseen images compared to 2D images. Then, features for each vertex on the posed mesh from multiple images are integrated by MLPs. The integrated features acting as the latent code are anchored to the SMPLX mesh in the canonical space. Finally, latent code for each 3D point is extracted and utilized to calculate the SDF. Our work for reconstructing the human shape on canonical pose achieves 3rd performance on WCPA MVP-Human Body Challenge.

translated by 谷歌翻译

Graph Representation Learning for Popularity Prediction Problem: A Survey

Tiantian Chen , Jianxiong Guo , Weili Wu

分类：机器学习

2022-03-15

在线社交平台，例如Twitter，Facebook，LinkedIn和微信在过去十年中的发展非常快，并且是人们互相交流和共享信息的最有效平台之一。由于“口口相传”的效果，信息通常可以在这些社交媒体平台上迅速传播。因此，重要的是研究推动信息扩散的机制并量化信息传播的后果。许多努力都集中在这个问题上，以帮助我们更好地理解并在病毒营销和广告中实现更高的性能。另一方面，在过去的几年中，神经网络的发展蓬勃发展，导致大量的图表学习（GRL）模型。与传统模型相比，GRL方法通常被证明更有效。在本文中，我们对现有作品进行了全面的审查，该综述使用GRL方法用于普及预测问题，并根据其主要使用的模型和技术将相关文献分为两个大类：基于嵌入的方法和深度学习方法。深度学习方法进一步分为六个小类：卷积神经网络，图形卷积网络，图形注意力网络，图形神经网络，复发性神经网络和增强学习。我们比较这些不同模型的性能，并讨论它们的优势和局限性。最后，我们概述了受欢迎程度预测问题的挑战和未来机会。

translated by 谷歌翻译

A Heterogeneous Graph Learning Model for Cyber-Attack Detection

Mingqi Lv , Chengyu Dong , Tieming Chen , Tiantian Zhu , Qijie Song , Yuan Fan

分类：机器学习

2021-12-16

网络攻击是经验丰富的黑客违反目标信息系统的恶意尝试。通常，网络攻击的特征在于混合TTP（策略，技术和程序）和长期的对抗性行为，使传统的入侵检测方法无效。通过参考域知识（例如，威胁模型，威胁智能），基于手动设计的规则来实现大多数现有网络攻击检测系统。但是，这种过程缺乏智力和泛化能力。旨在基于出处数据提出一种基于出处数据的智能网络攻击检测方法。为了有效和高效地检测来自出现数据中的大量系统事件的网络攻击，我们首先通过异构图来模拟出现数据来捕获每个系统实体的丰富上下文信息（例如，过程，文件，套接字等。），并为每个系统实体学习语义矢量表示。然后，我们通过从异构图表中采样小型和紧凑的本地图来进行在线网络攻击检测，并将关键系统实体分类为恶意或良性。我们在两个物始数据集中进行了一系列实验，具有真正的网络攻击。实验结果表明，该方法优于其他基于学习的检测模型，对基于最先进的网络攻击检测系统具有竞争性能。

translated by 谷歌翻译

A Review of Speech-centric Trustworthy Machine Learning: Privacy, Safety, and Fairness

Tiantian Feng , Rajat Hebbar , Nicholas Mehlman , Xuan Shi , Aditya Kommineni , and Shrikanth Narayanan

分类：机器学习

2022-12-18

Speech-centric machine learning systems have revolutionized many leading domains ranging from transportation and healthcare to education and defense, profoundly changing how people live, work, and interact with each other. However, recent studies have demonstrated that many speech-centric ML systems may need to be considered more trustworthy for broader deployment. Specifically, concerns over privacy breaches, discriminating performance, and vulnerability to adversarial attacks have all been discovered in ML research fields. In order to address the above challenges and risks, a significant number of efforts have been made to ensure these ML systems are trustworthy, especially private, safe, and fair. In this paper, we conduct the first comprehensive survey on speech-centric trustworthy ML topics related to privacy, safety, and fairness. In addition to serving as a summary report for the research community, we point out several promising future research directions to inspire the researchers who wish to explore further in this area.

translated by 谷歌翻译

On Generalization and Regularization via Wasserstein Distributionally Robust Optimization

Qinyu Wu , Jonathan Yu-Meng Li , Tiantian Mao

分类：机器学习

2022-12-12

Wasserstein distributionally robust optimization (DRO) has found success in operations research and machine learning applications as a powerful means to obtain solutions with favourable out-of-sample performances. Two compelling explanations for the success are the generalization bounds derived from Wasserstein DRO and the equivalency between Wasserstein DRO and the regularization scheme commonly applied in machine learning. Existing results on generalization bounds and the equivalency to regularization are largely limited to the setting where the Wasserstein ball is of a certain type and the decision criterion takes certain forms of an expected function. In this paper, we show that by focusing on Wasserstein DRO problems with affine decision rules, it is possible to obtain generalization bounds and the equivalency to regularization in a significantly broader setting where the Wasserstein ball can be of a general type and the decision criterion can be a general measure of risk, i.e., nonlinear in distributions. This allows for accommodating many important classification, regression, and risk minimization applications that have not been addressed to date using Wasserstein DRO. Our results are strong in that the generalization bounds do not suffer from the curse of dimensionality and the equivalency to regularization is exact. As a byproduct, our regularization results broaden considerably the class of Wasserstein DRO models that can be solved efficiently via regularization formulations.

translated by 谷歌翻译

Dynamics-Adaptive Continual Reinforcement Learning via Progressive Contextualization

Tiantian Zhang , Zichuan Lin , Yuxing Wang , Deheng Ye , Qiang Fu , Wei Yang , Xueqian Wang , Bin Liang , Bo Yuan , Xiu Li

分类：机器学习 | 人工智能

2022-09-01

在动态环境中，持续增强学习（CRL）的关键挑战是，随着环境在其生命周期的变化，同时最大程度地减少对学习的信息的灾难性忘记，随着环境在其一生中的变化而变化。为了应对这一挑战，在本文中，我们提出了Dacorl，即动态自动持续RL。 Dacorl使用渐进式上下文化学习了上下文条件条件的策略，该策略会逐步将动态环境中的一系列固定任务群集成一系列上下文，并选择一个可扩展的多头神经网络以近似策略。具体来说，我们定义了一组具有类似动力学的任务，并将上下文推理形式化为在线贝叶斯无限高斯混合物集群的过程，这些过程是在环境特征上，诉诸在线贝叶斯推断，以推断上下文的后端分布。在以前的中国餐厅流程的假设下，该技术可以将当前任务准确地分类为先前看到的上下文，或者根据需要实例化新的上下文，而无需依靠任何外部指标来提前向环境变化发出信号。此外，我们采用了可扩展的多头神经网络，其输出层与新实例化的上下文同步扩展，以及一个知识蒸馏正规化项来保留学习任务的性能。作为一个可以与各种深度RL算法结合使用的一般框架，Dacorl在稳定性，整体性能和概括能力方面具有一致的优势，而不是现有方法，这是通过对几种机器人导航和Mujoco Socomotion任务进行的广泛实验来验证的。

translated by 谷歌翻译

HTML版本

Knowing Where and What: Unified Word Block Pretraining for Document Understanding

Song Tao , Zijian Wang , Tiantian Fan , Canjie Luo , Can Huang

分类：自然语言处理 | 人工智能

2022-07-28

由于文档的复杂布局，提取文档的信息是一项挑战。大多数以前的研究以一种自我监督的方式开发了多模式预训练的模型。在本文中，我们专注于包含文本和布局信息的单词块的嵌入学习，并提出UTEL，这是具有统一文本和布局预训练的语言模型。具体而言，我们提出了两个预训练任务：布局学习的周围单词预测（SWP），以及对识别不同单词块的单词嵌入（CWE）的对比度学习。此外，我们用1D剪裁的相对位置嵌入了常用的一维位置。这样，掩盖布局语言建模（MLLM）的联合训练和两个新提出的任务可以以统一的方式在语义和空间特征之间进行相互作用。此外，提议的UTEL可以通过删除1D位置嵌入，同时保持竞争性能来处理任意长度的序列。广泛的实验结果表明，UTEL学会了比以前在各种下游任务上的方法更好的联合表示形式，尽管不需要图像模式。代码可在\ url {https://github.com/taosong2019/utel}中获得。

translated by 谷歌翻译

Audio-Visual MLP for Scoring Sport

Jingfei Xia , Mingchen Zhuge , Tiantian Geng , Shun Fan , Yuantai Wei , Zhenyu He , Feng Zheng

分类：计算机视觉

2022-03-08

花样滑冰评分是一项艰巨的任务，因为它需要评判玩家的技术动作以及与背景音乐的协调。先前基于学习的工作无法很好地解决它，原因有两个：1）每次动作迅速变化，因此，仅应用传统的框架采样将损失很多有价值的信息，尤其是在3-5分钟的持续视频中，因此非常极端远程表示学习是必要的； 2）先前的方法很少考虑其模型中的关键视听关系。因此，我们介绍了一个多模式MLP体系结构，名为Skating-Mixer。它将基于MLP混合的框架扩展到多模式的方式，并通过我们设计的内存复发单元（MRU）有效地学习长期表示。除模型外，我们还收集了高质量的音频Visual FS1000数据集，该数据集包含1000多个视频，其中8种具有7种不同评级指标的程序类型，并在数量和多样性中都超过其他数据集。实验表明，所提出的方法优于公共FIS-V和我们的FS1000数据集的所有主要指标。此外，我们还包括一项分析，将我们的方法应用于北京2022年冬季奥运会的最新比赛，证明我们的方法具有强大的鲁棒性。

translated by 谷歌翻译

A Surrogate-Assisted Controller for Expensive Evolutionary Reinforcement Learning

Yuxing Wang , Tiantian Zhang , Yongzhe Chang , Bin Liang , Xueqian Wang , Bo Yuan

分类：神经与进化计算

2022-01-01

增强学习（RL）和进化算法（EAS）的整合旨在同时利用样品效率以及两种范例的多样性和鲁棒性。最近，基于这一原则的混合学习框架在各种具有挑战性的机器人控制任务中取得了巨大的成功。然而，在这些方法中，通过与真实环境的相互作用来评估来自遗传群的策略，限制了他们在计算昂贵的问题中的适用性。在这项工作中，我们提出了代理辅助控制器（SC），一种新颖和高效的模块，可以集成到现有框架中，以通过部分更换昂贵的政策评估来缓解EAS的计算负担。应用该模块的关键挑战是防止优化过程被代理所引入的可能的虚假最小值误导。要解决此问题，我们为SC提供了两种策略来控制混合框架的工作流程。 Openai健身房平台的六个连续控制任务的实验表明，SC不仅可以显着降低健身评估的成本，还可以提高原始混合框架的性能与协作学习和进化过程。

translated by 谷歌翻译

Attribute Inference Attack of Speech Emotion Recognition in Federated Learning Settings

Tiantian Feng , Hanieh Hashemi , Rajat Hebbar , Murali Annavaram , Shrikanth S. Narayanan

分类：机器学习

2021-12-26

语音情感识别（SER）处理语音信号以检测和表征表达的感知情绪。许多SER应用系统经常获取和传输在客户端收集的语音数据，以远程云平台进行推理和决策。然而，语音数据不仅涉及在声乐表达中传达的情绪，而且还具有其他敏感的人口特征，例如性别，年龄和语言背景。因此，塞尔系统希望能够在防止敏感和人口统计信息的意外/不正当推论的同时对情感构建进行分类的能力。联合学习（FL）是一个分布式机器学习范例，其协调客户端，以便在不共享其本地数据的情况下协同培训模型。此培训方法似乎是安全的，可以提高SER的隐私。然而，最近的作品表明，流动方法仍然容易受到重建攻击和会员推论攻击等各种隐私攻击的影响。虽然这些大部分都集中在计算机视觉应用程序上，但是使用FL技术训练的SER系统中存在这种信息泄漏。为了评估使用FL培训的SER系统的信息泄漏，我们提出了一个属性推理攻击框架，其分别涉及来自共享梯度或模型参数的客户端的敏感属性信息，分别对应于FEDSGD和FADAVG训练算法。作为一种用例，我们使用三个SER基准数据集来统一地评估我们预测客户的性别信息的方法：IEMocap，Crema-D和MSP-EXPLA。我们表明，使用FL培训的SER系统可实现属性推理攻击。我们进一步确定大多数信息泄漏可能来自SER模型中的第一层。

translated by 谷歌翻译